10 research outputs found
EnsNet: Ensconce Text in the Wild
A new method is proposed for removing text from natural images. The challenge
is to first accurately localize text on the stroke-level and then replace it
with a visually plausible background. Unlike previous methods that require
image patches to erase scene text, our method, namely ensconce network
(EnsNet), can operate end-to-end on a single image without any prior knowledge.
The overall structure is an end-to-end trainable FCN-ResNet-18 network with a
conditional generative adversarial network (cGAN). The feature of the former is
first enhanced by a novel lateral connection structure and then refined by four
carefully designed losses: multiscale regression loss and content loss, which
capture the global discrepancy of different level features; texture loss and
total variation loss, which primarily target filling the text region and
preserving the reality of the background. The latter is a novel local-sensitive
GAN, which attentively assesses the local consistency of the text erased
regions. Both qualitative and quantitative sensitivity experiments on synthetic
images and the ICDAR 2013 dataset demonstrate that each component of the EnsNet
is essential to achieve a good performance. Moreover, our EnsNet can
significantly outperform previous state-of-the-art methods in terms of all
metrics. In addition, a qualitative experiment conducted on the SMBNet dataset
further demonstrates that the proposed method can also preform well on general
object (such as pedestrians) removal tasks. EnsNet is extremely fast, which can
preform at 333 fps on an i5-8600 CPU device.Comment: 8 pages, 8 figures, 2 tables, accepted to appear in AAAI 201
SynSig2Vec: Learning Representations from Synthetic Dynamic Signatures for Real-world Verification
An open research problem in automatic signature verification is the skilled
forgery attacks. However, the skilled forgeries are very difficult to acquire
for representation learning. To tackle this issue, this paper proposes to learn
dynamic signature representations through ranking synthesized signatures.
First, a neuromotor inspired signature synthesis method is proposed to
synthesize signatures with different distortion levels for any template
signature. Then, given the templates, we construct a lightweight
one-dimensional convolutional network to learn to rank the synthesized samples,
and directly optimize the average precision of the ranking to exploit relative
and fine-grained signature similarities. Finally, after training, fixed-length
representations can be extracted from dynamic signatures of variable lengths
for verification. One highlight of our method is that it requires neither
skilled nor random forgeries for training, yet it surpasses the
state-of-the-art by a large margin on two public benchmarks.Comment: To appear in AAAI 202
SPTS: Single-Point Text Spotting
Existing scene text spotting (i.e., end-to-end text detection and
recognition) methods rely on costly bounding box annotations (e.g., text-line,
word-level, or character-level bounding boxes). For the first time, we
demonstrate that training scene text spotting models can be achieved with an
extremely low-cost annotation of a single-point for each instance. We propose
an end-to-end scene text spotting method that tackles scene text spotting as a
sequence prediction task. Given an image as input, we formulate the desired
detection and recognition results as a sequence of discrete tokens and use an
auto-regressive Transformer to predict the sequence. The proposed method is
simple yet effective, which can achieve state-of-the-art results on widely used
benchmarks. Most significantly, we show that the performance is not very
sensitive to the positions of the point annotation, meaning that it can be much
easier to be annotated or even be automatically generated than the bounding box
that requires precise positions. We believe that such a pioneer attempt
indicates a significant opportunity for scene text spotting applications of a
much larger scale than previously possible. The code will be publicly
available
Automatic labeling of large amounts of handwritten characters with gate-guided dynamic deep learning
SVC-onGoing: Signature verification competition
This article presents SVC-onGoing1, an on-going competition for on-line signature verification where researchers can easily benchmark their systems against the state of the art in an open common platform using large-scale public databases, such as DeepSignDB2 and SVC2021_EvalDB3, and standard experimental protocols. SVC-onGoing is based on the ICDAR 2021 Competition on On-Line Signature Verification (SVC 2021), which has been extended to allow participants anytime. The goal of SVC-onGoing is to evaluate the limits of on-line signature verification systems on popular scenarios (office/mobile) and writing inputs (stylus/finger) through large-scale public databases. Three different tasks are considered in the competition, simulating realistic scenarios as both random and skilled forgeries are simultaneously considered on each task. The results obtained in SVC-onGoing prove the high potential of deep learning methods in comparison with traditional methods. In particular, the best signature verification system has obtained Equal Error Rate (EER) values of 3.33% (Task 1), 7.41% (Task 2), and 6.04% (Task 3). Future studies in the field should be oriented to improve the performance of signature verification systems on the challenging mobile scenarios of SVC-onGoing in which several mobile devices and the finger are used during the signature acquisitionThis work has been supported by projects: PRIMA (H2020- MSCA-ITN-2019-860315), TRESPASS-ETN (H2020-MSCA-ITN-2019-
860813), INTER-ACTION (PID2021-126521OB-I00 MICINN/FEDER), Orange Labs, and by UAM-Cecaban